Exploiting confusion matrices for automatic generation of topic hierarchies and scaling up multi-way classifiers
نویسنده
چکیده
A common way to evaluate a multi-way classifier is a confusion matrix that plots, for each of the learned concepts, the true class of test instances against the predicted classes. Aggregate accuracy figures of the classifier are obtained by summing up the diagonal entries of the confusion matrix. However, invaluable information about the relationships amongst classes is often ignored. In this report we show various ways in which the notion of similarity amongst subsets of classes from the confusion matrix can be exploited. First, we provide a mechanism of generating more meaningful intermediate levels of hierarchies in large flat sets of classes. This provides valuable navigational aid in browsing large text collections like Internet directories. Second, we show how large multi-class classification tasks can be scaled up with the number of classes. This angle to text classification has been ignored so far in much existing work. New methods like Support Vector Machines have high accuracy but are expensive to run, do not scale to large number of classes, and are not inherently designed for multi-class tasks. We propose a two stage scheme where a confusion matrix from a fast, mediocre accuracy classifier like naive Bayes can be used to derive a graph, where classes are linked to each other based on their degree of confusion with each other. For each class we then identify a sub graph where classes confuse with it. We have now broken up the initial large multi-class problem into smaller sub tasks where, for each class only its relevant sub graph needs to be considered. We use high accuracy, expensive classifiers like SVMs for these sub tasks. The results are promising with significant performance gains of the graph-based method over multi-class SVM classifiers. The resulting accuracy is also significantly higher than the original naive Bayes classifier and is comparable to the best multi-class SVM classifier.
منابع مشابه
Inter-class relationships in text classification
Text classification is an active research area motivated by many real-world applications. Even so, research formulations and prototypes often make assumptions that are not suitable for deployment. For example, in many real applications, the set of class labels keeps evolving, continual user feedback must be integrated into the classifier, and test documents may come from a population statistica...
متن کاملAutomatic Generation of a Multi Agent System for Crisis Management by a Model Driven Approach
Considering the increasing occurrences of unexpected events and the need for pre-crisis planning in order to reduce risks and losses, modeling instant response environments is needed more than ever. Modeling may lead to more careful planning for crisis-response operations, such as team formation, task assignment, and doing the task by teams. A common challenge in this way is that the model shou...
متن کاملA new 2D block ordering system for wavelet-based multi-resolution up-scaling
A complete and accurate analysis of the complex spatial structure of heterogeneous hydrocarbon reservoirs requires detailed geological models, i.e. fine resolution models. Due to the high computational cost of simulating such models, single resolution up-scaling techniques are commonly used to reduce the volume of the simulated models at the expense of losing the precision. Several multi-scale ...
متن کاملMemory Bottlenecks and Memory Contention in Multi-Core Monte Carlo Transport Codes
Current and next generation processor designs require exploiting on-chip, fine-grained parallelism to achieve a significant fraction of theoretical peak CPU speed. The success or failure of these designs will have a tremendous impact on the performance and scaling of a number of key reactor physics algorithms run on next-generation computer architectures. One key example is the Monte Carlo (MC)...
متن کاملOn learning hierarchical classifications
Many significant real-world classification tasks involve a large number of categories which are arranged in a hierarchical structure; for example, classifying documents into subject categories under the library of congress scheme, or classifying world-wide-web documents into topic hierarchies. We investigate the potential benefits of using a given hierarchy over base classes to learn accurate m...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002